Early Steps Towards Web Scale Information Extraction with LODIE
نویسندگان
چکیده
SPRING 2015 55 Extracting information from a gigantic data source such as the web has been considered a major research challenge, and over the years many different approaches (Etzioni et al. 2004; Banko et al. 2007; Carlson et al. 2010; Freedman and Ramshaw 2011; Nakashole, Theobald, and Weikum 2011) have been proposed. Nevertheless, the current state of the art has mainly addressed tasks for which resources for training are available (for example, the TAP ontology in the paper by Etzioni and colleagues [2004]) or use generic patterns to extract generic facts (for example, Banko et al. [2007]; OpenCalais.com). The limited availability of resources for training has so far prevented the study of the generalized use of large-scale resources to port to specific user information needs. The linked open data information-extraction (LODIE) project focuses on the study of IE models and algorithms able to perform efficient user-centered web-scale learning by exploiting linked open data (LOD). In this article we will highlight the initial steps of the LODIE project, focusing on a specific IE task, wrapper induction (WI), which consists of automatically learning wrappers for uniform web pages, that is, pages from one website, usually generated with the same script and all describing the same type of entity. We show results on the WI task, exploiting linked data obtained from DBpedia as learning material. Linked data is a recom-
منابع مشابه
Web Scale Information Extraction with LODIE
Information Extraction (IE) is the technique for transforming unstructured textual data into structured representation that can be understood by machines. The exponential growth of the Web generates an exceptional quantity of data for which automatic knowledge capture is essential. This work describes the methodology for Web scale Information Extraction adopted by the LODIE project (Linked Open...
متن کاملLODIE: Linked Open Data for Web-scale Information Extraction
This work analyzes research gaps and challenges for Web-scale Information Extraction and foresees the usage of Linked Open Data as a groundbreaking solution for the field. The paper presents a novel methodology for Web scale Information Extraction which will be the core of the LODIE project (Linked Open Data Information Extraction). LODIE aims to develop Information Extraction techniques able t...
متن کاملUser driven Information Extraction with LODIE
Information Extraction (IE) is the technique for transforming unstructured or semi-structured data into structured representation that can be understood by machines. In this paper we use a user-driven Information Extraction technique to wrap entity-centric Web pages. The user can select concepts and properties of interest from available Linked Data. Given a number of websites containing pages a...
متن کاملIntegrating Open and Closed Information Extraction: Challenges and First Steps
Over the past years, state-of-the-art information extraction (IE) systems such as NELL [5] and ReVerb [9] have achieved impressive results by producing very large knowledge resources at web scale with minimal supervision. However, these resources lack the schema information, exhibit a high degree of ambiguity, and are difficult even for humans to interpret. Working with such resources becomes e...
متن کاملTowards a Method for Unsupervised Web Information Extraction
The literature provides a variety of techniques to build the information extractors on which some data integration systems rely. Information extraction techniques are usually based on extraction rules that require maintenance and adaptation if web sources change. In this paper, we present our preliminary steps towards a completely unsupervised information extraction technique that searches for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- AI Magazine
دوره 36 شماره
صفحات -
تاریخ انتشار 2015